Markowitz model and the efficient frontier¶

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
# get data from yahoo finance 
data = yf.download("VUG BND XLV MCHI VT EWT VGLT IEMG VBK", period="max")["Close"]
data.dropna(inplace=True)
df = pd.concat([data], axis=1)
df.columns = ["VUG", "BND", "XLV", "MCHI", "VT", "EWT", "VGLT", "IEMG", "VBK"]
[*********************100%***********************]  9 of 9 completed

Info:¶

  • VUG Vanguard Growth ETF (https://investor.vanguard.com/investment-products/etfs/profile/vug)
  • BND Vanguard Total Bond Market ETF (https://investor.vanguard.com/investment-products/etfs/profile/bnd#overview)
  • XLV Health Care Select Sector SPDR Fund (https://www.sectorspdrs.com/mainfund/xlv)
  • MCHI iShares MSCI China ETF (https://www.ishares.com/us/products/239619/ishares-msci-china-etf)
  • VT Vanguard Total World Stock ETF (https://investor.vanguard.com/investment-products/etfs/profile/vt)
  • EWT Ishares Msci Taiwan ETF (https://www.ishares.com/us/products/239686/ishares-msci-taiwan-etf)
  • VGLT Vanguard Long-Term Treasury Index Fund ETF (https://investor.vanguard.com/investment-products/etfs/profile/vglt)
  • IEGM iShares Core MSCI Emerging Markets ETF (https://www.ishares.com/us/products/244050/ishares-core-msci-emerging-markets-etf)
  • VBK Vanguard Small-Cap Growth ETF (https://investor.vanguard.com/investment-products/etfs/profile/vbk)
In [3]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2676 entries, 2012-10-24 to 2023-06-14
Data columns (total 9 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   BND     2676 non-null   float64
 1   EWT     2676 non-null   float64
 2   IEMG    2676 non-null   float64
 3   MCHI    2676 non-null   float64
 4   VBK     2676 non-null   float64
 5   VGLT    2676 non-null   float64
 6   VT      2676 non-null   float64
 7   VUG     2676 non-null   float64
 8   XLV     2676 non-null   float64
dtypes: float64(9)
memory usage: 209.1 KB
In [4]:
# plot historical data
for single_data in data:
    plt.figure(figsize=(8, 4), dpi=100)
    plt.plot(data[single_data], color="blue")
    plt.title(single_data)
    plt.xlabel("Time")
    plt.ylabel("Price")
    plt.grid(True)
    plt.show()
In [5]:
# log-returns of the assets
log_returns = np.log(df/df.shift(1))
log_returns.dropna(inplace=True)
log_returns.head(5)
Out[5]:
VUG BND XLV MCHI VT EWT VGLT IEMG VBK
Date
2012-10-25 -0.001654 0.001571 0.006614 0.005814 0.003506 -0.005564 0.006277 0.000856 0.007937
2012-10-26 0.002126 -0.010257 -0.007237 -0.011888 -0.003271 0.011755 -0.003134 0.000570 -0.003960
2012-10-31 0.001651 0.000000 0.000830 0.000902 0.008159 0.007847 -0.001047 -0.000713 -0.007218
2012-11-01 -0.001533 0.012609 0.015839 0.029322 0.015549 -0.004570 0.014763 0.012611 0.008210
2012-11-02 0.000354 -0.010232 -0.009020 -0.004828 -0.017291 -0.005117 -0.010582 -0.009479 -0.005466
In [6]:
log_returns.tail(5)
Out[6]:
VUG BND XLV MCHI VT EWT VGLT IEMG VBK
Date
2023-06-08 0.004964 0.001267 0.007128 0.011704 -0.000980 0.011467 0.006105 0.009864 0.006404
2023-06-09 -0.002065 0.004844 0.004252 -0.001318 -0.007469 -0.000792 0.001154 0.003070 0.003072
2023-06-12 0.002065 0.008993 0.002623 0.000000 0.010894 0.001900 0.006686 0.014262 0.003597
2023-06-13 -0.004411 0.014674 0.009228 0.013754 0.013804 -0.009536 0.007779 0.007037 0.005030
2023-06-14 0.000829 -0.002054 0.006766 0.015491 -0.006988 0.005891 0.001549 0.004796 -0.010315

Correlation matrix¶

In [7]:
plt.figure(figsize=(10,6), dpi=100)
sns.heatmap(log_returns.corr(), cmap="Reds", linecolor="black", linewidth=0.1, annot=True);
In [8]:
np.random.seed(70)
n_portfolios = 10000
all_weights = np.zeros((n_portfolios, len(df.columns)))
ret_arr = np.zeros(n_portfolios)
vol_arr = np.zeros(n_portfolios)
sharpe_arr = np.zeros(n_portfolios)
In [9]:
# portfolios simulation(n_portoflios=10000)
for x in range(n_portfolios):
    weights = np.array(np.random.random(9))
    weights = weights / np.sum(weights)
    all_weights[x, :] = weights
    ret_arr[x] = np.sum((log_returns.mean() * weights * 252))
    vol_arr[x] = np.sqrt(np.dot(weights.T, np.dot(log_returns.cov() * 252, weights)))
    sharpe_arr[x] = ret_arr[x] / vol_arr[x]
In [10]:
print('Max sharpe ratio in the array is: {}'.format(sharpe_arr.max()))
print("It's located in the array: {}".format(sharpe_arr.argmax()))
reddot = (sharpe_arr.argmax())

max_sr_ret = ret_arr[sharpe_arr.argmax()]
max_sr_vol = vol_arr[sharpe_arr.argmax()]

print(all_weights[reddot, :])
Max sharpe ratio in the array is: 0.5845536385756669
It's located in the array: 9907
[0.09245677 0.05478004 0.00409467 0.00185034 0.00483573 0.27611547
 0.06868442 0.24107848 0.25610407]

Plotting with ploty.express¶

In [11]:
import plotly.express as px
In [12]:
df = pd.DataFrame({'Volatility': vol_arr, 'Return': ret_arr, 'Sharpe Ratio': sharpe_arr.round(2)})
fig = px.scatter(df, x="Volatility", y="Return",
                 color='Sharpe Ratio',title = 'Efficient Frontier')
fig.show()